Genome-wide analysis of the C2H2 zinc finger protein gene family and its response to salt stress in ginseng, Panax ginseng Meyer

The C2H2 zinc finger protein (C2H2-ZFP) gene family plays important roles in response to environmental stresses and several other biological processes in plants. Ginseng is a precious medicinal herb cultivated in Asia and North America. However, little is known about the C2H2-ZFP gene family and its functions in ginseng. Here, we identified 115 C2H2-ZFP genes from ginseng, defined as the PgZFP gene family. It was clustered into five groups and featured with eight conserved motifs, with each gene containing one to six of them. The family genes are categorized into 17 gene ontology subcategories and have numerous regulatory elements responsive to a variety of biological process, suggesting their functional differentiation. The 115 PgZFP genes were spliced into 228 transcripts at seed setting stage and varied dramatically in expression across tissues, developmental stages, and genotypes, but they form a co-expression network, suggesting their functional correlation. Furthermore, four genes, PgZFP31, PgZFP78-01, PgZFP38, and PgZFP39-01, were identified from the gene family that were actively involved in plant response to salt stress. These results provide new knowledge on origin, differentiation, evolution, and function of the PgZFP gene family and new gene resources for C2H2-ZFP gene research and application in ginseng and other plant species.

www.nature.com/scientificreports/ repression activities, thus responding to salt stress 14 . The OsZFP213 gene in rice was shown to interact with OsMAPK3, thus enhancing salt tolerance 15 . In soybean and Arabidopsis, GmZAT4 played an important role in PEG and NaCl stress tolerance and ABA response 16 . Therefore, the genes of the C2H2-ZFP gene family have been genome-wide identified and characterized in serval plant species, including Arabidopsis that contains 176 C2H2-ZFP genes 17 , rice (189) 18 , poplar (109) 19 , Medicago truncatula (218) 20 , maize (211) 21 , and soybean (321) 22 . The C2H2-ZFP gene family is usually expressed as X2CX2-4CX12HX2-8H, where X represents any amino acid, and the number indicates the number of any amino acids 23 . It is characterized with its C2H2 zinc finger protein that forms coordination bonds with two pairs of cysteine and histidine residues and further forms a compact finger-like tetrahedral structure through β hairpins and α helices 24 . In addition, the gene family contains a few conserved domains, including DNA binding domain, transcription regulatory domain, and protein interaction domain 23 . The DNA binding domain is featured by a highly conserved QALGGH motif in which every amino acid is important for its DNA binding activity. Absence of the QALGGH motif in a C2H2-ZFP gene may reduce ABA sensitivity, thus stomatal size 25 , and influence inflorescence development 26 . Another unique feature of the gene family is length variation of the spacer between two adjacent zinc fingers of the C2H2 zinc finger protein, which may also affect DNA binding 23 . Moreover, some C2H2-ZFPs contain a highly conserved amino acid sequence (L/FDLNL/FxP), known the EAR motif, in carboxy terminal region 27 . The EAR motif is the smallest known transcription repression domain. When a single residue in the EAR motif changed, the transcription repression ability was greatly reduced or disappeared 28 . The three DLN amino acids of the hexa-peptide motif is indispensable for transcriptional inhibition and the presence of at least two DLN residues produced maximum transcriptional inhibition 29 .
Nevertheless, the C2H2-ZFP gene family has not been reported in ginseng. In the present study, we identified 228 C2H2-ZFP gene transcripts spliced from 115 C2H2-ZFP genes from Jilin ginseng, defined PgZFP genes, and characterized them in genetic diversity, evolution, expression, functional differentiation, and gene × gene interaction. Furthermore, we identified four PgZFP genes involved in ginseng response to salt stress, thus confirming the roles of the family in plant response to salt stress. Therefore, the findings of this study provide a comprehensive insight into the PgZFP gene family in ginseng, knowledge, and gene resources necessary for advanced research and breeding in ginseng and related species.

Materials and methods
Plant materials. Three sets of Jilin ginseng plant materials were used for this study (Supplemental Fig. S1A) These plant materials included 14 tissues of a 4-year-old cv. Damaya plant, roots of 5-, 12-, 18-, 25-year-old cv. Damaya plants, and 4-year-old plant roots of 42 cultivars (hereafter referred as to genotypes) coded from S1 to S42 (Supplemental Table S1). The seeds of these plant materials were all from our laboratory, which are available upon request. The 14 tissues of the 4-year-old cv. Damaya plant included fiber root, leg root, main root epiderm, main root cortex, arm root, rhizome, stem, leaf peduncle, leaflet pedicel, leaf blade, fruit peduncle, fruit pedicel, fruit, and seed 7 . The roots of 5-, 12-, 18-, 25-year-old plants all were collected from a cultivar, Damaya. The 42 genotypes were collected from the origin and diversity center of ginseng, Jilin Province, China where over 65% of the world ginseng was produced. These genotypes were representatives of the genetic diversity of Jilin ginseng.

Databases.
To facilitate ginseng functional genomics research, we previously developed several databases of functional genes for Jilin ginseng. These databases included the gene sequence and expression database in the above 14 tissues of the 4-year-old plant (Database A) 7 , the gene sequence and expression database of the four different year-old plant roots (Database B) 7 , and the gene sequence and expression database of the 4-year-old plant roots of 42 genotypes (Database C) 30 31 , used as queries to search ginseng Database A for C2H2-ZFP genes, and identified the putative C2H2-ZFP genes in ginseng. Then, the identified ginseng putative C2H2-ZFP genes were imported into iTAK (http:// bioin fo. bti. corne ll. edu/ tool/ itak) to further confirm the putative C2H2-ZFP genes by C2H2-ZFP conservative domain search 32 . The putative C2H2-ZFP genes that were confirmed to have the C2H2-ZFP domains were considered as ginseng C2H2-ZFP-encoding genes and defined as PgZFP genes (Supplemental Table S2). Finally, the ORF finder (https:// www. ncbi. nlm. nih. gov/ orffi nder/) was used to identify the ORFs of the PgZFP genes. The relevant physiological and biochemical indicators of their putative protein sequences were predicted, including isoelectric point (PI), molecular weight (Da), grand average of hydropathicity, instability index, and aliphatic index, using Protparam (https:// web. expasy. org/ protp aram/). Distribution of the PgZFP gene family in the ginseng genome and its gene number variation within the ginseng species and synteny with Arabidopsis. We aligned the PgZFP genes identified above to three ginseng genome assemblies 4-6 using Blastn to determine their distribution in the ginseng genome 6 and the variation of the PgZFP gene family size among genotypes. The criteria of identity ≥ 99%, cover length ≥ 200 bp, and e-value ≤ 1.0E−100 were used for the alignment. The PgZFP genes were aligned to the Arabidopsis genome at criteria of identity ≥ 75%, cover length ≥ 80 bp, and e-value ≤ 1.0E−10, given that ginseng is distantly related with Arabidopsis (Supplemental Fig. S1B) 33 . The distribution of the gene family in the ginseng genome and its synteny with the Arabidopsis genome were constructed using the R-package Circlize 34  www.nature.com/scientificreports/ aligned to each genome assembly were compared to identify the pan-and core-transcriptomes of the gene family among the three ginseng genotypes.
Duplication, phylogeny, and evolution of the PgZFP gene family. To determine the origin, evolution, and phylogeny of the PgZFP gene family, we screened for the ORFs of the transcripts of all 115 PgZFP genes using the NCBI ORF finder (https:// www. ncbi. nlm. nih. gov/ orffi nder/), calculated the Ka/Ks ratio of the PgZFP genes recently duplicated, constructed their phylogenetic tree, and predicted their conserved domains. The transcripts with complete CDS were used for Ka/Ks calculation using the KaKs_Calculator 35 . The time of the gene duplication and divergence was estimated by T = Ks/2λ × 10 −6 , where Ks is synonymous nucleotide substitution rate and λ is the number of substitutions per synonymous site per generation, for which λ = 6.5 × 10 −9 was used [36][37][38] .
The transcript that has the longest amino acid sequence for a PgZFP gene was used for constructing the phylogenetic tree of the gene family. The phylogenetic tree was constructed using the maximum-likelihood (ML) method of MEGA X 39 , with 1000 bootstrap replications. Forty-five C2H2-ZFPs from Arabidopsis were downloaded from Plant Transcription Factor Database (http:// plntf db. bio. uni-potsd am. de/ v3.0/) and used as the outgroup and evolutionary reference. MEME 40 was used to predict the conserved domains in the PgZFP gene transcripts. The putative protein sequences of the 26 PgZFP genes that have the complete conservative domains were used as representatives for the conserved domain prediction of the genes. Motif length was set to 10-50 amino acids and the maximum number of motifs was set to 6, 8, and 10, respectively, while other parameters were set as default.
Functional divergence of the PgZFP genes. Studies has documented that the types of gene cis-regulatory elements, such as promoters, are related to the biological functionality of the genes, such as plant responses to hormones, abiotic and biotic stresses, and plant growth and development. We estimated the functional differentiation and divergence of the genes in the PgZFP gene family by gene ontology (GO) categorization and cis-regulatory element sequence analysis. The PgZFP gene transcripts were GO categorized using Blast2GO V5.0 41 . The enrichment of the number of PgZFP gene transcripts categorized into each subcategory was tested by Chi-square test using the categorization of the transcripts expressed in the 4-year-old ginseng plant as the control 7 . The 1500-bp upstream sequences of the PgZFP genes were analyzed, the cis-regulatory elements of the genes were identified, and the types of the cis-regulatory elements were classified using PlantCARE database 42 .
Expression characterization and co-expression network of the PgZFP gene family and its hub genes. We characterized the PgZFP gene family by analysis of the expressions, heatmaps, and co-expression networks of its gene transcripts in 14 tissues of a 4-year-old plant, in 5-, 12-, 18-, and 25-year-old plant roots, and in the 4-year-old plant roots of 42 genotypes. The expressions of the PgZFP gene transcripts were extracted from Databases A, B, and C, respectively. The expression heatmaps of the genes were constructed using an R language package for heatmap construction and visualization. The co-expression networks of the genes were constructed using the BioLayout Express 3D Version 3.2 software 43 . Furthermore, we tested the tendency that the PgZFP gene transcripts formed a co-expression network by Student's t-test using 176 transcripts randomly selected from the 228 PgZFP gene transcripts identified in this study with the same numbers of transcripts randomly selected from Database A as a control. The test was carried out at a series of cutoff values from P ≤ 5.0E−02 to 1.0E−08, with 20 replicates per cutoff value. In addition, we identified the hub genes of the PgZFP gene family network using the following criteria: network cutoff P-value ≤ 0.001, connectivity ≥ 30, and the percentage of hub genes in the total number of genes in the network ≤ 5%. The connectivity of a gene in a network indicates not only the degree of its co-expression correlation with other genes, but also the number of genes with which it interacts in the network.
Response of ginseng to salt stress. To examine the potential function of the PgZFP gene family in response to salt stress, we stressed ginseng with the NaCl salt using its adventitious roots as experimental materials. The 1-cm adventitious roots were cultured and stressed on the B5 medium containing 0 mM, 20 mM, 40 mM, and 80 mM NaCl at 25 oC for 30 days. The lengths of the roots were measured, quickly frozen in liquid nitrogen, and stored at -80℃ for gene expression analysis.

Relative expressions of the PgZFP genes under salt stress.
We first analyzed the PgZFP genes by aligning them to the salt-responsive genes identified in A. thaliana, STZ and AZF. The PgZFP genes that were best aligned to the STZ and AZF genes were then selected and subjected to comparative expression analysis using the ginseng adventitious roots stressed with and without salt by real-time quantitative PCR (qPCR) to test whether the genes of the PgZFP gene family respond to salt stress. The qPCR primers of the selected PgZFP genes were designed, based on their sequences (Supplemental Table S3). Total RNA was isolated from the adventitious roots stressed with the above different NaCl concentrations using the Trizol method. mRNA was purified from the total RNA, and first-strand cDNA was synthesized and used as qPCR templates. The CYP gene was used as the reference gene 44 . qPCR was conducted using the Applied Biosystems 7500 Real-Time System (ABI, USA) and the Ultra SYBR Mixture Kit (Low ROX) (ComWin, Beijing, China). The 2 −ΔΔCT method was employed to determine the relative expressions of the PgZFP genes.

Research involving plants.
Authors confirm that all methods were performed in accordance with the relevant guidelines and regulations. All plant materials used in this study were from the authors' laboratory.

Results
Identification of the PgZFP gene transcripts and the biophysical and biochemical properties of their putative proteins. A total of 228 PgZFP gene transcripts were identified, with a sequence length from 202 to 5327 bp and an average length of 1419 bp (Supplemental Table S2). These transcripts were spliced from 115 PgZFP genes, with 1 to 20 transcripts per gene and an average of 2 transcripts per gene. The molecular weights of the putative proteins of the PgZFP gene transcripts varied from 3808 to 177,151 Daltons (Da), their isoelectric point (PIs) ranged from 4.55 to 12.13, the maximum grand average of their hydropathicities was 1.253, and the minimum grand average of their hydropathicities was − 1.512. Since only 16 of the PgZFP gene transcripts had a positive grand average of hydropathicity, we speculated that most genes of the PgZFP gene family code hydrophilic proteins. The instability indices of the PgZFP putative proteins were between 18.90 and 88.75, with most of them being greater than 40.00, suggesting that most PgZFP genes in the gene family code unstable proteins (Supplemental Table S4).  Table S5).

Distribution of the
To determine the origin and phylogeny of the PgZFP gene family in Jilin ginseng (Supplemental Fig. S1A), the putative proteins of the longest transcripts of all 115 PgZFP genes were used for the experiment (Supplemental Table S4). Forty-five representative A. thaliana (At) C2H2-ZFP genes selected from their phylogenetic tree (Supplemental Table S6) were used as outgroup (Supplemental Fig. S1B) 33 . The PgZFP genes were classified, with the AtC2H2-ZFP genes, into five clusters, defined I through V (Fig. 2). Cluster I included 26 PgZFP genes grouped with the members from the C1-2i, C1-3i, C1-4i, and C1-5i clusters of the AtC2H2-ZFP genes. Cluster II consisted of 25 PgZFP genes grouped with the members from the C2, C3, A3, and A4 subfamilies of the AtC2H2-ZFP genes. Cluster III was made of 46 PgZFP genes grouped with the members from the C1-1i, C1-2i, A1-a, A1-b, and A1-c clusters of the AtC2H2-ZFP genes. Cluster IV contained 11 PgZFP genes grouped with the members from the AtC2H2-ZFP A2 subfamily and B family. Cluster V was constituted of 7 PgZFP genes grouped with the only member from the AtC2H2-ZFP C2 subfamily 17 . These results suggested the ancient origin and diversity of the PgZFP gene family.
Moreover, we analyzed the conserved motifs of the PgZFP gene family using the putative proteins of 26 representative PgZFP genes that have complete conserved domains within ORFs with criteria of motif length = 10-50 amino acids and a maximum motif number of 6, 8, or 10. When a maximum motif number of 6 was used, six www.nature.com/scientificreports/ conservative motifs were identified. When a maximum motif number of 8 was employed, eight conservative motifs were identified. When a maximum motif number of 10 was applied, ten conservative motifs were identified. Nevertheless, the six or eight motifs of the PgZFP genes identified with a maximum motif number of 6 or 8 were completely consistent with the first six or eight of the 10 motifs of the genes identified with a maximum motif number of 10. Figure 3A shows the eight conserved motifs of the PgZFP genes identified with the maximum motif number of 8, defined Motif 1 through Motif 8. Motif 1 was identified in 88.5% of the 26 PgZFP genes examined (Fig. 3B), indicating the evolutionary conservation of the PgZFP gene family. The QALGGH of Motifs 1 and 2 plays important roles in DNA binding 8 . The EXEXXAXCLXXL (L-box) of Motif 4 is a leucine rich region that is considered to play an important role in protein-protein interactions 23 . Motif 5, as EAR domain, includes the core DLNL sequence and has been proven to play an important role in transcriptional inhibition and abiotic stress response 29,45 .  (Fig. 4A). The MF category  www.nature.com/scientificreports/ included 88 transcripts, of which 27 were MF-specific, 17 were categorized into both MF and CC categories, 13 were categorized into both MF and BP categories, and 31 were categorized into all three primary categories. The BP category included 49 transcripts, of which 3 were BP-specific, 2 were categorized into both BP and CC categories, and the remaining transcripts were categorized with MF, and with MF and CC as above. The CC category included 56 transcripts, of which 6 were CC-specific and the remaining transcripts were categorized with MF, with BP, and with MF and BP as above. The BP category was further categorized into eight subcategories at level 2, of which three were down-enriched in number of the PgZFP genes (P ≤ 0.05 or 0.01) (Fig. 4B). The MF category was categorized into three subcategories, catalytic activity, binding, and transcription regulator activity, of which the PgZFP genes involved in catalytic activity were down-enriched (P ≤ 0.01) and those involved in binding or transcription regulator activity were up-enriched (P ≤ 0.01). The CC category was categorized into six subcategories, of which two were down-enriched and two up-enriched in number of the PgZFP genes (P ≤ 0.05 or 0.01). These results suggested the functional differentiation, divergence, and specialty of the PgZFP genes. Furthermore, we examined whether the GO categorization of the PgZFP gene family was consistent across tissues, developmental stages, and genotypes. The results showed that the PgZFP gene family was consistently categorized into the same 17 subcategories as above at level 2 across tissues, developmental stages, and genotypes (Supplemental Fig. S2). Nevertheless, the number of the PgZFP gene transcripts categorized into each of the 17 subcategories varied substantially across tissues, developmental stages, and genotypes.
Next, we analyzed the cis-regulatory elements of the PgZFP genes, such as promoter elements, because of their relationships with gene expression activities and potential biological functions. Since 73 of the 115 PgZFP genes were aligned to the Chinese ginseng genome 6 , the 1500-bp upstream sequences of the 73 PgZFP genes were searched for cis-regulatory elements. A total of 3709 cis-regulatory elements were identified for the 73 PgZFP genes and these elements were classified into 53 types, such as TATA-box, CAAT-box, SARE, ARE, AuxRE, and MBS. These elements are responsive to hormones, environmental stresses, and plant growth (Supplemental Fig. S3A). Of the 3709 cis-regulatory elements of the 73 PgZFP genes, 276 were responsive to hormones, including auxin, gibberellin, salicylic acid, abscisic acid, and MeJA (Supplemental Fig. S3B); 149 to environmental stresses, including defense, light, low-temperature, and drought (Supplemental Fig. S3C); and 82 to plant growth, such as seed, meristem expression, and endosperm expression (Supplemental Fig. S3D).

Expression characteristics of the PgZFP gene transcripts in different tissues, at different developmental stages, and across genotypes.
We characterized the expressions of the PgZFP gene family spatially, temperately, and across genotypes collected across the origin and diversity center of Jilin ginseng in different aspects. Analysis of a random selection of transcripts from the PgZFP gene family showed that the expressions of different gene transcripts varied dramatically in a tissue, at a developmental stage, or in a genotype. Nevertheless, the expression of a gene transcript was relatively consistent across tissues, developmental stages, and genotypes, even though its expression also varied across tissues, developmental stages, and genotypes (Supplemental Fig. S4). Of the 228 transcripts of the PgZFP gene family identified in this study, only 54.4-71.9% expressed in a single tissue (Supplemental Fig. S5A) and 27.6-33.8% expressed at a single developmental stage of root (Supplemental Fig. S5B). Forty-six of them expressed at all four developmental stages of the roots and 13, 5, 4, and 14 expressed specifically at 5-, 12-, 18-, and 25-year-old roots, respectively (Supplemental Fig. S5C). Among the genotypes studied, 49.1-64.5% of the PgZFP gene transcripts expressed in the 4-year-old plant root of a genotype (Supplemental Fig. S5D). Transcript expression heatmap analysis showed that the expressions of a vast majority of the transcripts in the PgZFP gene family was independently regulated across tissues, across developmental stages, and across genotypes (Fig. 5). Only PgZFP94 and PgZFP80-05 were found to be co-regulated across the tissues analyzed (Fig. 5A); none of the genes analyzed was co-regulated across the developmental stages of roots (Fig. 5B); and PgZFP36-02 and PgZFP36-03 were co-regulated across genotypes (Fig. 5C).
Co-expression network of the PgZFP gene transcripts and its potential hub genes. The above phylogenetic analysis, GO categorization, and cis-regulatory element examination of the PgZFP gene family indicated that the gene family has substantially differentiated in sequence and functionality. The question is whether any relationship remains among the genes of the PgZFP gene family. Therefore, we conducted co-expression network analysis with PgZFP gene transcripts. The results showed that all 228 transcripts of the PgZFP gene family formed a single strong co-expression network (Fig. 6A). The network consisted of 228 nodes and 4745 edges that were clustered into eight clusters (Fig. 6B). In comparison, the network of the PgZFP gene family was much more robust than that of randomly selected ginseng unknown transcripts (Fig. 6C,D). Statistics showed that the PgZFP gene transcripts were more likely to form a co-expression network than the randomly selected ginseng unknown transcripts (Fig. 6E,F). These results suggested that although the gene family has substantially differentiated in sequence and functionality, the expression activities of its genes still maintain correlated, indicating their functional correlation. Further analysis revealed that eight of the 115 PgZFP genes, PgZFP79, PgZFP82, PgZFP114, PgZFP87, PgZFP01-02, PgZFP48, PgZFP63-04, and PgZFP30, likely played central roles in the network when P ≤ 0.001 was applied; therefore, these genes are likely the hub genes, with each gene having a connectivity of 30-43 (Supplemental Fig. S6). Moreover, we attempted to align these eight PgZFP genes to the Arabidopsis genome, but only PgZFP79 and its paralogous gene, PgZFP79P, were aligned to the At3G48430 (REF6) in the Arabidopsis genome (see Fig. 1). This gene was found to be a positive regulator of flowering in an FLC-dependent pathway in Arabidopsis 46 .
Response of the PgZFP genes to salt stress in ginseng. Because the above cis-regulatory element analysis of the PgZFP gene family showed that the genes of the family are likely to be responsive to environmental stresses, we further studied the gene family in responses to environmental stresses, especially to salt stress.  6 and subjected to cis-regulatory element analysis, showing its potential responsiveness to hormones, environmental stresses, and growth. These four PgZFP genes were all from Cluster III of the family tree. Ginseng adventitious roots were used for the experiment. Figure 7A,B shows that the ginseng adventitious roots were sensitive to salt stress (NaCl). When the concentration of salt increased to 40 mM, the growth of ginseng roots was significantly inhibited (Fig. 7A), indicated by shorter roots (P ≤ 0.05) (Fig. 7B). The relative expression levels of all four genes were up-regulated by salt (Fig. 7C). When the concentration of the salt approached 20 mM NaCl, the relative expressions of two of the four genes started to increase significantly. When the concentration of the salt approached 40 mM NaCl or higher, the relative expressions of all four genes increased significantly (P ≤ 0.05) or extremely significantly (P ≤ 0.01), suggesting that at least four genes in the PgZFP gene family are involved in plant response to salt stress.

Discussion
The C2H2-ZFP gene family has been shown in several species, including Arabidopsis, rice, tomato, and soybean, to play important functions in plant responses to abiotic and biotic stresses, plant growth and development, and hormone signal transduction [8][9][10][11][13][14][15][16] . However, the gene family remains unknown in ginseng. This study has, for the first time, genome-wide identified and characterized the gene family in ginseng. A total of 228 C2H2-ZFP gene transcripts, alternatively spliced from 115 C2H2-ZFP genes, are identified and defined PgZFP genes. Therefore, the PgZFP gene family consists of at least 115 gene members. This size of the gene family in ginseng is comparable with those of the gene family in poplar (109) 19 and tomato (99) 47 , but it is smaller than those of the gene family in Arabidopsis (176) 17 , rice (189) 18 , Medicago truncatula (218) 20 , maize (211) 21 , and soybean (321) 22 . This result indicates that the PgZFP gene family is a moderate gene family. Nevertheless, this number of the PgZFP genes was identified in the Chinese ginseng cv. Damaya. Pan-transcriptome analysis reveals the number of genes in the family varies substantially among genotypes of P. ginseng, with pan-transcriptome of 149 PgZFP genes and a core-transcriptome of only 12 genes, suggesting that ginseng has a dispensable transcriptome varying by at least 137 PgZFP genes. The PgZFP gene family is distributed in all 24 chromosomes of the Chinese ginseng genome, but only 17 (23%) of its 73 mapped PgZFP genes are syntenic to 11 of the Arabidopsis C2H2-ZFP genes. Analysis of 46 of the 115 PgZFP genes that have complete CDS shows that 40 (87%) of them were duplicated in the period of 18-41 MYA, suggesting that gene duplication plays a major role in the gene family expansion. The Ka/Ks ratio analysis indicates that purifying and neutral selections drive the family evolution. The present PgZFP gene family is classified into five clusters along with the C2H2-ZFP genes from Arabidopsis, suggesting that the gene family is an ancient gene family that originated before splitting between ginseng and Arabidopsis. Each cluster of the PgZFP gene family has specific conserved motifs distinguishing from other clusters; nevertheless, most genes of the gene family contain the highly conserved QALGGH motif or its variant, such as R/KALGGH. A previous study showed that the change of any amino acid in QALGGH may affect its DNA binding ability, and the mutation of the Q amino acid greatly reduced its DNA binding ability 48 . The C2H2-ZFP genes containing both QALGGH and I/D/FLN motifs played important roles in plant response to biotic and abiotic stresses 23 .
The PgZFP gene family was categorized into 17 subcategories at Level 2 and revealed to have 53 types of cis-regulatory elements responsive to multiple biological processes, suggesting that during the evolutionary process, changes have substantially occurred in the structure and upstream regions of PgZFP genes and these   In comparison, poplar has also most (106) of the C2H2-ZFP genes involved in binding 19 and tomato has all annotated C2H2-ZFP genes involved in binding, including nuclear acid binding, organic cyclic compound binding, and heterocyclic compound binding 47 . This suggests that the PgZFP genes play a role in transcriptional regulation by binding to downstream target genes 28,29,52 . The expression analyses of the PgZFP gene family have resulted in several interesting findings. First, most of the genes in the PgZFP gene family expressed at a relatively low level in a tissue, at a developmental stage, and indicates that the difference is significant at P ≤ 0.01; and "NS" indicates the difference is not significant at P ≤ 0.05. The randomly selected ginseng unknown gene transcripts were selected from Database A as controls. www.nature.com/scientificreports/ in the root of a genotype. The PgZFP genes that actively expressed in one tissue, at one developmental stage, and in the root of one genotype also tended to actively express in other tissues, at other developmental stages, and in the roots of other genotypes. Second, it is apparent that the expression relationships of the genes in the family are not consistent with their phylogenetic relationships determined by amino acid sequence similarity, suggesting that the genes having similar sequences may not have similar expression patterns. Third, the expressions of the transcripts spliced from the same gene may be substantially different in a tissue, at a developmental stage, and in the root of a genotype. Finally, of the 115 genes in the PgZFP gene family the expressions of only a few are co-regulated, while the expressions of a vast majority are independently regulated. Nevertheless, the genes of the PgZFP gene family are more likely to form a co-expression interaction network, of which some play central roles in the network, indicating that the gene members of the gene family functionally remain correlated 49 . Previous studies showed that the C2H2-ZFP gene family plays important roles in growth and development, and plant responses to hormones and biotic and abiotic stresses 23 . The cis-regulatory element analysis of the PgZFP genes in the present study provides another line of evidence on these roles of the genes in ginseng. Moreover, four genes of the PgZFP gene family, PgZFP31, PgZFP78-01, PgZFP38, and PgZFP39-01, have been identified that were involved in response to salt stress, suggesting that the PgZFP gene family indeed plays roles in plant responses to abiotic stresses, particularly to salt stress in ginseng. Interestingly, the four PgZFP genes are all from Cluster III of the gene family tree. Of the four salt-stress responsive PgZFP genes examined, PgZFP31 and PgZFP78-01 have similar zinc finger structures to STZ that is involved in plant response to salt stress in Arabidopsis 14 . Both PgZFP31 and PgZFP78-01 belong to C1-2i type zinc finger proteins, have a high similarity in the DLN motif, and contain the FDLNI/L motif. The similar result has been obtained for PgZFP38 and PgZFP39-01 in comparison with AZF1 that is also involved in plant response to salt stress in Arabidopsis 14,53 . These four PgZFP genes, therefore, provide gene resources for salt tolerance research and genetic improvement in ginseng. The expressions of the genes in the roots stressed with salt were presented as the expression level relative to those of the genes in the roots without salt stress (0 mM NaCl) considered as "1". The t-test was used to test the mean difference of the gene expressions between control and salt-stressed roots. "*", significant at P ≤ 0.05; "**", significant at P ≤ 0.01.

Conclusion
The PgZFP gene family is an ancient gene family consisting of approximately 115 PgZFP genes distributed among all 24 chromosomes of the ginseng genome. It originated before splitting of ginseng from Arabidopsis and its genes have substantially diverged in amino acid sequences and functionality, since they duplicated 18-41 MYA. Nevertheless, conserved motifs exist among the putative proteins of the genes in the family. Different gene members in the gene family express differently in a tissue, at a developmental stage, and in a genotype and the expression of the same gene varies across tissues, developmental stages, and genotypes, which further indicates differentiation of their functionality. Nevertheless, the genes in the family tend to express correlatively, forming a co-expression network, suggesting their functional correlation. Biologically, the PgZFP gene family plays important roles in plant response to salt stress in ginseng, from which four PgZFP genes are identified to be involved in response to salt stress in ginseng.

Data availability
The data used for this study have been deposited at Sequence Read Archive (SRA) of National Center for Biotechnology Information (NCBI), BioProject PRJNA302556; and at Gene Expression Omnibus (GEO) of NCBI, SRP066368 and SRR13131364-SRR13131405. The plant materials are available from the corresponding authors, upon request.